120 research outputs found
Active networks: an evolution of the internet
Active Networks can be seen as an evolution of the classical model of packet-switched networks. The traditional and ”passive” network model is based on a static definition of the network node behaviour. Active Networks propose an “active” model where the intermediate nodes (switches and routers) can load and execute user code contained in the data units (packets). Active Networks are a programmable network model, where bandwidth and computation are both considered shared network resources. This approach opens up new interesting research fields. This paper gives a short introduction of Active
Networks, discusses the advantages they introduce and presents the research advances in this field
A customizable multi-agent system for distributed data mining
We present a general Multi-Agent System framework for
distributed data mining based on a Peer-to-Peer model. Agent
protocols are implemented through message-based asynchronous
communication. The framework adopts a dynamic load balancing
policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances
Dynamic load balancing in parallel KD-tree k-means
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis.
Techniques for improving the efficiency of k-Means have been
largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing
issue. Three solutions have been developed and tested. Two
approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Effectiveness of landmark analysis for establishing locality in p2p networks
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality
preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis
have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model
with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required
Efficient mining of discriminative molecular fragments
Frequent pattern discovery in structured data is receiving
an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset
A parallel genetic algorithm for the Steiner Problem in Networks
This paper presents a parallel genetic algorithm to the
Steiner Problem in Networks. Several previous papers
have proposed the adoption of GAs and others
metaheuristics to solve the SPN demonstrating the
validity of their approaches. This work differs from them
for two main reasons: the dimension and the
characteristics of the networks adopted in the experiments
and the aim from which it has been originated. The reason
that aimed this work was namely to build a comparison
term for validating deterministic and computationally
inexpensive algorithms which can be used in practical
engineering applications, such as the multicast
transmission in the Internet. On the other hand, the large
dimensions of our sample networks require the adoption
of a parallel implementation of the Steiner GA, which is
able to deal with such large problem instances
A fuzzy approach for the network congestion problem
In the recent years, the unpredictable growth of the Internet has moreover pointed out the congestion problem, one of the problems that historicallyha ve affected the network. This paper deals with the design and the evaluation of a congestion control algorithm which adopts
a FuzzyCon troller. The analogyb etween Proportional Integral (PI) regulators and Fuzzycon trollers is discussed and a method to determine the scaling factors of the Fuzzycon troller is presented. It is shown that
the Fuzzycon troller outperforms the PI under traffic conditions which are different from those related to the operating point considered in the design
Distributed mining of molecular fragments
In real world applications sequential algorithms of
data mining and data exploration are often unsuitable for
datasets with enormous size, high-dimensionality and complex
data structure. Grid computing promises unprecedented
opportunities for unlimited computing and storage resources. In this context there is the necessity to develop
high performance distributed data mining algorithms.
However, the computational complexity of the problem and
the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment
Adaptive routing in active networks
New conceptual ideas on network architectures have been proposed in the recent past. Current store-andforward
routers are replaced by active intermediate systems,
which are able to perform computations on transient packets,
in a way that results very helpful for developing and
deploying new protocols in a short time. This paper introduces a new routing algorithm, based on a congestion
metric, and inspired by the behavior of ants in nature. The
use of the Active Networks paradigm associated with a cooperative learning environment produces a robust, decentralized algorithm capable of adapting quickly to changing conditions
- …